WHAT: An XSLT-Based Infrastructure For The Integration Of Natural Language Processing Components
نویسنده
چکیده
The idea of the Whiteboard project is to integrate deep and shallow natural language processing components in order to benefit from their synergy. The project came up with the first fully integrated hybrid system consisting of a fast HPSG parser that utilizes tokenization, PoS, morphology, lexical, named entity, phrase chunk and (for German) topological sentence field analyses from shallow components. This integration increases robustness, directs the search space and hence reduces processing time of the deep parser. In this paper, we focus on one of the central integration facilities, the XSLT-based Whiteboard Annotation Transformer (WHAT), report on the benefits of XSLT-based NLP component integration, and present examples of XSL transformation of shallow and deep annotations used in the integrated architecture. The infrastructure is open, portable and well suited for, but not restricted to the development of hybrid NLP architectures as well as NLP applications.
منابع مشابه
Integrating deep and shallow natural language processing components: representations and hybrid architectures
We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processin...
متن کاملTechniques for Text Planning with XSLT
We describe an approach to text planning that uses the XSLT template-processing engine to create logical forms for an external surface realizer. Using a realizer that can process logical forms with embedded alternatives provides a substitute for backtracking in the text-planning process. This allows the text planner to combine the strengths of the AI-planning and template-based traditions in na...
متن کاملInterpreting Imperative Programming Languages in Extensible Stylesheet Language Transformations (XSLT)
We use XSLT to implement an interpreter for a simple XML based imperative programming language called “XIM.” Our work shows that not only is it theoretically possible to use XSLT as a programming language processor, but also that this is practically feasible. This has potential application in the area of delivering executable content over the Internet.
متن کاملExamination of Authors' Stylistic Elements of Electronic Messages based on Researched Studies
Identifying author is an important issue in natural language processing and text classification. It shows the author's characteristic in various texts. The rapid development of the Internet causes Web-based tools such as email and blogs with an anonymous identity become a popular method of communication for the perpetrators. Moreover, it creates some specific security issues. In this paper, we ...
متن کاملDesign and Development of Early Warning System for Desertification and Land Degradation
Early warning systems are key components of strategies to reduce risk. This research, by adopting a systematic approach in the management of the risk of desertification and by including previously developed models and systems, offers an integrated efficient structure in terms of early warning for the risk of desertification as a pilot system for semi-arid areas of west Golestan Province in IRAN...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003